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The (2+p)-Satisfiability (SAT) problem interpolates between different classes of complexity the- 
qq . ory and is believed to be of basic interest in understanding the onset of typical case complexity 

■ in random combinatorics. In this paper, a tricritical point in the phase diagram of the random 

Q\ | 2 + p-SAT problem is analytically computed using the replica approach and found to he in the range 

2/5 < po < 0.416. These bounds on po are in agreement with previous numerical simulations and 
rigorous results. 
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PACS Numbers : 05.20 - 64.60 - 87.10 



£ ■ I. INTRODUCTION 

c ■ 

The satisfiability (SAT) problem [1] is the prototype of NP-completc combinatorial decision problems arising in 
theoretical computer science. Such decision problems are, by definition, the most difficult problems solvable in 
polynomial time by some ideal non-deterministic algorithm [1]. In practice, however, real algorithms may drastically 
change their performances depending on whether the instances of the problem are highly constrained or not. Therefore, 
the worst-case classification on which complexity theory is founded does not necessarily capture the behaviour of 
search algorithms in specific applications. For example, random instances of NP-complete decision problems undergo 
' O ■ a dramatic change in the median time required for their solution when the instances are generated at the boundary 
of a critical region in the parameter space (for an introduction to these issues, see ref. [2]). 

A paradigm for such a behaviour is provided by the random K-Satisfiability (K-SAT) problem. Briefly speaking, 
one is given N Boolean variables and a set of M clauses to be satisfied simultaneously. A clause refers to a logical 
constraint on K Boolean variables, randomly chosen among the TV ones. For large instances (M,N — > oo), K-SAT 
exhibits a striking threshold phenomenon as a function of the intensive ratio a = M/N . Numerical simulations show 
that the probability of finding an assignment of the Boolean variables satisfying all clauses, falls abruptly from one 
down to zero when a crosses a critical value a c (K) of the number of clauses per variable [3]. This scenario is rigorously 
established in the (Polynomial) K = 2 case, where a c (2) = 1 [4]. For K > 3, much less is known; K(> 3)-SAT belongs 
to the NP-complete class, roughly meaning that running times of search algorithms are thought to scale exponentially 
in N when the problem instances are critically constrained. Recent numerical works have provided an estimate for 
OO i a c (3) ~ 4.2 -4.3 [3]. 

A statistical mechanics approach has been attempted to get insights on the K-SAT problem [5-7]. These studies 
relie on the correspondence between solutions and ground-states of diluted spin-glass like cost-energy functions. 
Threshold phenomena therefore correspond to zero temperature critical points in the phase diagram of the associated 
spin glass model. Replica Symmetric (RS) theory gives the correct value of the threshold for K = 2 but fails in 
. predicting the critical a c for K > 3 [6,7]. This stems from the nature of the transition taking place at a c , which is 
continuous for K = 2 and appears discontinuous when K > 3. In the latter case, the precise location of the critical 
point for the first order transition would require an appropriate replica symmetry breaking scheme. For interacting 
O ■ models with finite connectivity, the latter issue is still an open problem under many aspects [8]. 

Very recently [9] , it has been suggested that the particular nature - continuous or discontinuous - of the phase 
transition taking place at the threshold could be strictly connected with the appearance of computationally hard 
instances, and hence to the onset of exponential regimes in search algorithms [10]. Recent numerical studies on the 
so-called 2 + p-SAT problem [9], that smoothly interpolates between 2-SAT (p = 0) and 3-SAT (p = 1) [7], have 
strongly supported this statement. It follows that the interest in the precise analytical localisation of discontinuous 
transitions in random SAT models goes much beyond the purely technical aspects of the replica formalism. 

In this paper, we present the analytical calculation of the tricritical point po of the 2 + p-SAT model, separating 
second-order phase transitions (0 < p < po) from first-order ones (po < V ^ !)■ I n section II, we recall the definition 
of the 2 + p-SAT model. The main steps of the statistical physics analysis are exposed in Section III. In section IV, 
we study the critical region and establish the self-consistent equations fulfilled by the order parameter at threshold. 
We analyse these equations and show that 2/5 < po < 0.416. In conclusion, we underline the agreement between our 
result and some recent mathematical study on the 2 + p-SAT model. 
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II. PRESENTATION OF THE 2 + P-SAT MODEL 



The 2 + p-SAT model is a mixed version of 2-SAT and 3-SAT including (1 — p)M (resp. pM) clauses constraining 
two (resp. three) Boolean variables [7]. 

To start with, we consider a set of N Boolean variables {xi — 0, l}»=i,...,jv- We first randomly choose 2 among 
the N possible indices i and then, for each of them, a literal Zi that is the corresponding Xj or its negation Xi with 
equal probabilities one half. A clause C is the logical OR of the 2 previously chosen literals, that is C will be true 
(or satisfied) if and only if at least one literal is true. Next, we repeat this process to obtain (1 —p)M independently 
chosen clauses {Cg}g = i n~ p )M an d ask for all of them to be true at the same time, i.e. we take the logical AND of 
the M clauses thus obtaining a Boolean expression in the so called Conjunctive Normal Form (CNF). The resulting 
2-CNF formula Fi may be written as 

(l-p)M (l-p)M / 2 \ 

i=i t=\ \i=i / 

where f\ and V stand for the logical AND and OR operations respectively. 

Then, using the above prescription, we generate a 3-CNF, hereafter called F 3 including pM clauses of length three. 
The resulting Boolean formula F that we shall analyze, reads F = F2 AF3. A logical assignment of the {xi}'s satisfying 
all clauses, that is evaluating F to true, is called a solution of the satisfiability problem. If no such assignment exists, F 
is said to be unsatisfiable. It is worth noticing that as far as the complexity classification of the problem is concerned, 
for any p > any instance of the model contains a 3-CNF sub-formula, therefore proving that the problem itself 
belongs to the NP-complete class. 

This model has a threshold behaviour as usual K-SAT instances [9,11] at a critical ratio M/N = a c (p), with 
a c (0) = 1 and a c (l) = al sat ~ 4.2 — 4.3. The critical ratio is obviously bounded from above by a c (p) < 1/(1 — p), 
obtained from the requirement that F 2 has to be almost surely satisfiable. We shall show in the following that 

a c (p) = -^— , (0<p< Po ), (2) 
1-p 

i.e. that the upper bound is reached when p is smaller that a value p lying in the range 

0.4 < po < 0.416 . (3) 

Most remarkably, since an earlier presentation of our result [9], a rigorous proof of the equality (2) has been derived 
for p < 2/5 based on the analysis of the so-called unit clause algorithm [11]. 



III. STATISTICAL MECHANICS ANALYSIS 



A. The energy-cost function 



The above mixed random SAT problem can be mapped onto a diluted spin cost-energy function upon introducing 
the spin variables, Si — 1 if the Boolean variable x, is true, Si = —1 if Xj is false, and by taking into account the 
clauses through an M x N random matrix C where Ce t i = —1 (respectively +1) if clause Ci contains Xj (resp. Xj), 
otherwise. It can be checked easily that J2iLi CuSi equals the number of wrong literals in clause I. Then the 
cost-energy function 



(l-p)M 

E[C,S}= ]T 6 



£=1 



N 



M 

+ E • 

l=(l-p)M+\ 



N 



Li=l 



(4) 



where S[.; .] denotes the Kronecker function, counts the number of violated clauses in the CNF Boolean expression F 
for logical assignment S. The ground state (GS) energy of the cost function (4), i.e. its minimum over S at fixed 
C, encodes for the existence of satisfying assignments (zero violated clauses, Egs — 0) or, if not, for the minimum 
number {Eqs > 0) of violated clauses. 

It is worth noticing that, in addition to usual two-spins interactions that give rise to continuous phase transitions 
[13], the energy (4) involves three-spins interactions due to the presence of 3-clauses. The latter can generate discon- 
tinuous phase transitions at sufficiently high concentration, i.e. for large enough p [14]. The value of the tricritical 
point po separating the second order phase transitions from the first order ones on the threshold line a c {p) we want 
to calculate in the following. 
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B. The average over the disorder 



Resorting to the replica method for diluted spin-glasses and following ref. [7] , one proceeds by computing the model 
"free-energy" density at inverse temperature (3, averaged over the clauses distribution F{j3) — — -^lnZ[C] where 
Z[C] is the partition function. The overlinc denotes the average over the random clauses matrices C and is performed 
using the replica trick InZ = lim„^o(^™ — starting from integer values of n. The typical properties of the 

ground state, i.e. the internal energy and the entropy are recovered in the (3 — > oo limit. 

To express the n-th moment of the partition function, it results convenient to use the multi-level gas formalism 
proposed in [8]. The replicated theory is equivalent to a gas of N particles occupying 2™ levels labelled by n-binary 
component vectors a = (ci = ±1,<T2 = ±1, ...,oy, = ±1). Calling p(a) the population, that is the fraction of 
particles, on level <?, the energy of the gas per particle reads after some simple algebra exposed in Appendix, 



a 



Egas[p] = --p{l-p) In 



p In 



exp -/3]T% a ;l]<5[T a ;l] 

3.T \ a=l / 

p(v)p(t)p(uj) exp -/3^% a ; l]5[r a ; l]S[u a ; 1] 



a=l 



(5) 



with the symmetry constraint p{cr) = p(—a). The stationary distribution p s of the level populations p in the thermo- 
dynamic limit N — > oo is obtained by balancing the above energetic interactions and the mixing entropy (per particle) 

[8] 



Sgas\p] = ~^2p(e) In p{5) 



(6) 



that is minimising E gas [p] — S gas [p]/(3. The dominant contribution to Z n is then given by 

Z™ ~ exp ( -/3N 



Egas[Ps] ~ -ySgaslps] 



(7) 



The determination of the saddle-point p s (<?) is very difficult in general but can be performed under some simplifying 
assumptions. 



C. The replica symmetric theory 

In the replica symmetric (RS) hypothesis, one looks for a stationary distribution p s (ai,a 2} ■ ■ ■ ,cr n ) invariant under 
any permutation of the n replicas. Therefore, p s {<j) depends on its argument through J2a=i ° a om y- This allows the 
introduction of a generating function R(z), 

.0za a /2 

(8) 



p s (a u <7 2 , . . . , a n ) = J ^ dz R{z) [[ ^ z/2 + p _ 0z/2 J 



which becomes the Laplace transform of the populations p s in the limit n — > [7]. Note that, since the sum of the 
fractions p equals one, R(z) is normalized to unity. 

The minimisation condition over p s yields a self-consistent equation for the function R{z). In the limits of interest 
n — > and (3 — > oo, this equation reads, see [7], 

r°° du [ f 00 

R(z) = / — cos(itz) exp < — a(l — p) + 2a(l — p) I dz\R{z\) cos(umin(l, 
J-oo 2?r I Jo 

3 f°° 1 

~-ap + 3ap J d Zl dz 2 R( Zl )R(z 2 ) co S (u min(l, z u z 2 ))\ . (9) 

The interpretation of R(z) is transparent within the cavity approach : it is the probability distribution of the effective 
fields z seen by the spins [7]. In other words, R(z) accounts for the histogram P(<C S of the thermal average 
values of the variables through the relation <c Si 3>= tanh(/3zi/2). 
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IV. ANALYSIS OF THE CRITICAL REGION 



A. The order parameter at threshold 

When a < a c (p), that is for weakly constrained formulas, the stable solution of (9) is R{z) = 5(z) since the number 
of fully constrained spins in the ground state is not extensive in TV [7,9]. Let us now fix p to a small value. According 
to the discussion of the previous Section, the SAT/UNSAT transition is thought to be of the second order. We thus 
consider some small (and even) fluctuations p,(z) = R(z) — S(z) around the solution of the SAT phase. From (9), we 
find 



fj,(z) = / dz-i L(z, z\)p,{zi) + / dz x dz 2 M(z 1 z ll z 2 )n(zi)n(z 2 ) + ) , (10) 
Jo Jo 

for all z where 

L(z,zi) = a(l — p) 6(z — <7i niin(l, z{)) 



ffl=±l 

M(z,z 1 ,z 2 ) = -ap ^2 <5(z-CTimin(l,zi,z 2 )) 

CTl=±l 



+ ^-a 2 (l-p) 2 ^ S(z - a 1 mm(l,z 1 ) - <r 2 min(l, z 2 )) • (H) 



ffl ,<T2=±1 



Let us restrict to z G [0; 1[ 1 . The inspection of the linear term in (11) shows that the threshold is given by (2). Next, 
we expand around the latter by posing a = a c (p) + x, n(z) = x r){z) + 0{x 2 ) and obtain, when x — > 0, 

/-oc 

= (1 - p) T](z) + / d Zl dz 2 M(z , zi, z 2 ) r}(zi)rj(z2) , (12) 
Jo 

where the kernel of the quadratic form reads 

M(z,z 1 ,z 2 ) = 3P V S(z - crimin(l,zi,z 2 )) 

l[l P > ai=±l 

+ i ^2 5( z - u \ min(l,^i) - a 2 min(l, z 2 )) ■ (13) 

ffl ,ff2 = ±l 

Note that the positivity of the probability distribution R imposes r](z) > for z ^ 0. Furthermore, the normalization 
of R implies that 

/CO 
dz rj(z) = . (14) 
-co 

Consequently, r](z) includes a Dirac peak in z = with a negative weight —r] a , i] a > 0. 

B. Discretisation of the self-consistent equations 

Within the iterative scheme for the RS solution discussed in [7], we can discretise the above equation and look for 
an exact solution of the form 



r](z) = -r] 5(z) + J2vt S ( 

£#0 ^ 



z-- . (15) 



1 Equation (9) is indeed a self-consistent constraint on R{z) on this range only, see [7]. 
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In the above equation, 1/q is the resolution of the effective field which eventually goes to zero. The self-consistent 
equations for the coefficients rje's, I = 0, 1, . . . , q — 1 are easily obtained from (12), 



and, for I = 1, . . . , q — 1, 



(i-p)% = \ y^jvI - vo j^vj + E^ 2 + (E^ 



9-1 g-1 

E^+E' 



'9-1 



(16) 



(1 -p)% = Ve ivo + \y~ 



e-i 



3 = 1 

1 e-i q-e-i lq-\ , 

o E Wi-i - E Wt+j + Vg-e E ^ ■ ~ 



(17) 



u =1 



C. Homogeneous equations at tricriticallity 

The onset of first order transition corresponds to the smallest value of p for which r/(z) diverges. Let us call po(q) 
the tricritical point for a resolution of the field 1/q. When q = 1, equation (16) gives 



'/() 



4(1 -p) 2 
3(1 - 2p) 



(18) 



leading to po(l) = 1/2- When increasing q, one gets smaller and smaller values for po{q), e.g. po{2) = 0.4614, po(3) = 
0.4484, . . .. When approaching po(q) from below, the weights of the Dirac peaks always diverge according to 



mip) - 



Mo) -p ' 



p->Po(q)~ 



(19) 



as can be explicitly checked on (18) for q = 1 and I = 0. Therefore, the amplitudes fie have to satisfy the homogeneous 
versions of equations (16, 17), 



3 l-2p 



^^-QoE% + £^ + [5>. 
P i=i j=i \j=i 



'9-1 



(20) 



and, for £ = 1, . . . , q — 1, 



= n e < n + 



3 p 



2 1 -p 



1 €-1 9-^-1 ! 

-- ^2 ttjfle-j - E fy^t+j + ^9-^ [ E ^ j ~ 9^0 



(21) 



k3 = 1 



The tricritical point p is the smallest value of p for which the quadratic forms in (20,21) have a non zero solution 
fl e . In the above equations, we can choose O = 1 arbitrarily and we are left with q coupled equations for p and the 
q — 1 amplitudes Q(, I = 1, . . . , q — 1. 



D. Lower bound to the tricritical point 

We now focus upon the self-consistent equation (20) that we rewrite as follows, 
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4(1 -p) 



(22) 



from which the lower bound 2/5 < po is immediately derived. Furthermore, this lower bound can be reached if and 
only if r](z) at the tricritical point vanishes outside of the interval ] — 1; 1[ and no Dirac distribution are present in 
the continous limit q — > oo. 



E. Upper bound to the tricritical point 

If {fli} is a solution of equations (20, 21) for a given pair of parameters (p, q — q), so is {&.2£ = ^i,^2i+i = 0} 
for (p, q = 2q). Thus, po(q) > Po(2g) > • • • > Po for any finite q, defining a sequence of more and more refined upper 
bounds to po- We have then obtained the numerical values of po(q) for q = 1, . . . , 120. It appears that po(q) indeed 
decreases with q and equals 0.4158 for q = 120, giving a numerical upper bound to po- 

The convergence of Po(o) down to its limit value po is very slow and seems to display some power law effects, see 
figure 1. At first sight, the numerical prediction for p is close to 0.41, a value close but higher than the lower bound 
2/5. 



V. CONCLUSION 



To end with, few observations are in order. The above results have been derived within an iterative RS scheme 
allowing for more and more refined effective field resolutions. With the simplest choice of integer fields, the value of 
Po would have been i, a wrong result which tells us that there must exist other non-integer contibutions to R(z). 
The appearance of non integer effective fields has recently been shown to reflect the existence of Replica Symmetry 
Breaking (RSB). Further work will be neccessary to elucidate the role of RSB effects on the structure of the solutions 
(in principle, even the calculation of p could be affected). The rigorous results discussed in ref. [11], show that the 
RS solution is exact at least up to p < 2/5. Such probabilistic results are based on the convergency analysis of a 
simple algorithm which proceeds by successive simplifications of the Boolean formula originated by fixing at random 
one variable at a time. In ref. [11] it is shown that for a < a c (p) and p < 2/5, the above algorithm has a finite 
probability of finding a satisfying assignment and hence the starting formula has to be satisfiable with probability one 
in the limit N — > oo. For p < 2/5 the 3-clauses are ineffective even for a rather trivial "dynamical process" like the 
mentioned algorithm. Such a result is indeed consistent with the idea that the nature of the phase transition taking 
place at a c (p) does not change at least up to p < 2/5. In the case po > 2/5, as suggested by the RS solution, it should 
be of interest to understand how one should modify the algorithm in order to recover the statistical mechanics result. 

Let us conclude by noticing that from a physical point of view, the nature of the transition manifests itself through 
the appearance of a finite fraction of completely constrained variables when crossing the threshold [7,12]. Above 
Po, this fraction discontinuously blows up at a c . The narrow correspondence between this fact and the onset of 
computational complexity shown by simulations [9] suggests that the underlying mechanisms causing the increase 
of the typical computational search cost could be related to the fact that search algorithms have to find the precise 
values of a O(N) number of Boolean variables through an extensive enumeration. 
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APPENDIX A: CALCULATION OF THE EFFECTIVE GAS ENERGY 



Consider n Boolean assignments S a , where a = 1, . . . ,n, each comprised of N binary spins. The replica method 
requires the computation of the average product of their Gibbs weights corresponding to energy (4) , 



z[S a }=e X p\-0J2E[C,S°]j (Al) 

factorises over the sets of two- and three-clauses due to the absence of any correlation in their probability distribution. 
Thus, 



G 



z[S a ] = (c 2 [S a ]) (1 ~ p)M (C 3 [S a ]) pM 

The single clause factors in the above formula are defined by (for K = 2, 3) 



( K [S a }=exp i~f3j2 6 



N 



a—l Li— 1 

•>K(N\ 



(A2) 



(A3) 



where the bar denotes the unbiased average over the set of 2 K ( K ) vectors of N components d = 0, ±1 and of squared 
norm equal to K. Using the identity, 



N 



i=l 



J] S[S?;-d\ 



we carry out the average over in disorder in (A3) to obtain 



Ms°l = ^ E w E exp{-/?£n<^£;-^]} 

Ci,...,C K =±l »i,...,ijr=l I o=K=l J 



(A4) 



(A5) 



to the largest order in iV. Defining p(a) as the fraction of spins (Sj, . . . , S") equal to (a 1 , . . . , a n ) [8], we rewrite 
C K [S a ] = Of M with 



Md = ^ E E p(-Ci^i)...p(-^aK)cxpj-/3^n<5K a ;i]) 



(A6) 



Notice that p(a) = p{—S) due to the even distribution of the disorder C . The final expression of the effective gas 
energy per particle, defined as E gas [p] = — log z[S a }/ (3 N is given in (5). 
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FIG. 1. Plot of po(q) versus q. The dashed line is the lower bound po = 2/5. 
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